Skip to content

[TRTLLM-12500][feat] Add support for Qwen3.5 VL MoE - REVERTED by #14599#14164

Merged
moraxu merged 9 commits into
NVIDIA:mainfrom
moraxu:qwen3_5_vl_moe
May 21, 2026
Merged

[TRTLLM-12500][feat] Add support for Qwen3.5 VL MoE - REVERTED by #14599#14164
moraxu merged 9 commits into
NVIDIA:mainfrom
moraxu:qwen3_5_vl_moe

Conversation

@moraxu

@moraxu moraxu commented May 15, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features

    • Added support for Qwen3.5 MoE vision language models with improved multimodal architecture.
    • Enhanced configuration utilities for better dtype handling and model compatibility.
  • Tests

    • Added comprehensive tests for Qwen3.5 MoE multimodal models.
    • Added MMMU accuracy evaluation and reference metrics for Qwen3.5-35B-A3B.

Review Change Stack

Description

  • Completes Qwen3.5-MoE-VL (Qwen3_5MoeForConditionalGeneration) on top of #12611.
  • Switches the VLM config path to HF's native transformers.Qwen3_5MoeConfig (present in 5.3.0), adds a thin post-load normalizer that materializes the handful of aliases the reused Qwen3Next runtime expects on text_config (intermediate_size from the MoE fields, rope_theta/partial_rotary_factor/rope_scaling from rope_parameters), and centralizes hybrid-cache dtype resolution in two helpers.

Test Coverage

Accuracy & unit tests
TODO: Comparison unit tests against HF

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py
@moraxu

moraxu commented May 15, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Full review triggered.

@coderabbitai

coderabbitai Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR adds Qwen3.5 MoE Vision Language Model support through dtype resolution utilities, config normalization for multimodal architectures, a new VLModel wrapper class, weight mapper registration, and comprehensive unit and integration tests.

Changes

Qwen3.5 MoE VLM Implementation

Layer / File(s) Summary
Config utilities and dtype resolution foundation
tensorrt_llm/_torch/pyexecutor/config_utils.py
Introduces _coerce_torch_dtype, resolve_hf_torch_dtype, and resolve_mamba_ssm_cache_dtype helpers to normalize HF dtype fields and Mamba cache dtypes, with extract_mamba_kv_cache_params and MambaKVCacheParams updated to use the new resolution logic.
Qwen3.5 config normalization and VLM adaptation
tensorrt_llm/_torch/pyexecutor/config_utils.py
Adds _normalize_qwen35_mrope_config, _normalize_qwen35_qwen3next_text_aliases, _normalize_qwen35_quantization_config, and _normalize_qwen35_moe_vl_config to normalize mRoPE aliases, quantization exclude-modules, and VLM model wiring. load_pretrained_config now loads qwen3_5_moe VLM checkpoints as Qwen3_5MoeConfig and applies VLM normalization.
VL base class architecture and embedding support
tensorrt_llm/_torch/models/modeling_qwen3vl.py
Qwen3VLModelBase.__init__ adds support for Qwen3_5MoeForConditionalGeneration architecture mapping, and init_mrope_embedding improves head_dim derivation via getattr with fallback to computed value.
Qwen3.5 MoE VLModel wrapper class and wiring
tensorrt_llm/_torch/models/modeling_qwen3_5.py, tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py, tensorrt_llm/_torch/models/__init__.py
Qwen3_5MoeVLModel class registered for Qwen3_5MoeForConditionalGeneration, composing vision encoder with text decoder, defining multimodal_data_device_paths, and implementing custom load_weights with conditional vision loading, namespace remapping, and Qwen3_5MoeHfWeightMapper integration. Weight mapper registered for VLM checkpoint, and model exported via __init__.
Model loader dtype resolution integration
tensorrt_llm/_torch/pyexecutor/model_loader.py
validate_and_set_mamba_ssm_cache_dtype updated to use resolve_mamba_ssm_cache_dtype and resolve_hf_torch_dtype helpers in precedence chain for dtype determination.
Qwen3Next load_weights parameter extension
tensorrt_llm/_torch/models/modeling_qwen3_next.py
Qwen3NextForCausalLM.load_weights extended with optional params_map and allow_partial_loading parameters forwarded to superclass.
Module export list formatting
tensorrt_llm/_torch/configs/__init__.py
Config module __all__ reformatted to multi-line list.
Unit tests for Qwen3.5 VLM config and model resolution
tests/unittest/_torch/modeling/test_modeling_qwen3_5_vl_moe.py
Helper function _write_qwen35_moe_vl_config and tests validate config architecture preservation, mamba_ssm_cache_dtype resolution, auto-model/mapper selection, and multimodal placeholder metadata registration.
Integration accuracy tests for Qwen3.5-35B-A3B VLM
tests/integration/defs/accuracy/references/mmmu.yaml, tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py
MMMU accuracy reference (59.0) added; new TestQwen3_5_35B_A3B_VL integration test class with memory gating, sampling configuration, and MMMU evaluation at max_batch_size=32.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • 2ez4bz
  • yechank-nvidia
  • syuoni
  • xinhe-nv
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description is incomplete. While it explains the core objective (adding Qwen3.5-MoE-VL support), it lacks critical required sections. Add a clear title with [TRTLLM-12500][feat] prefix, expand the Description section with detailed technical explanation, and replace 'TODO' with completed test coverage documentation.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title '[TRTLLM-12500][feat] Add support for Qwen3.5 VL MoE' clearly and concisely describes the main feature addition - Qwen3.5 Vision Language MoE support - matching the changeset which adds VLM infrastructure and models.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/pyexecutor/config_utils.py`:
- Around line 48-52: resolve_hf_torch_dtype and resolve_mamba_ssm_cache_dtype
call _coerce_torch_dtype on each candidate attribute but immediately return its
result, so a returned None from _coerce_torch_dtype (the "auto" sentinel)
prematurely stops the fallback chain; change both functions to only return the
coerced dtype when _coerce_torch_dtype(...) is not None, otherwise continue
scanning the remaining attributes (i.e., call getattr for each attr, call
_coerce_torch_dtype, and if the result is truthy/not None then return it; if
None keep looping and finally return None).

In `@tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py`:
- Around line 438-447: The new test method test_auto_dtype lacks an explicit
return type; update its signature to include "-> None" (i.e., def
test_auto_dtype(self) -> None:) to comply with repository typing rules and
mypy-friendly guidelines—locate the test_auto_dtype method that constructs
LLM(...) and calls task.evaluate on MMMU(...) and add the return annotation
there.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b09424f2-ec07-43fb-b7b5-e73404064a0a

📥 Commits

Reviewing files that changed from the base of the PR and between d75df19 and 44ca139.

📒 Files selected for processing (11)
  • tensorrt_llm/_torch/configs/__init__.py
  • tensorrt_llm/_torch/models/__init__.py
  • tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
  • tensorrt_llm/_torch/models/modeling_qwen3_5.py
  • tensorrt_llm/_torch/models/modeling_qwen3_next.py
  • tensorrt_llm/_torch/models/modeling_qwen3vl.py
  • tensorrt_llm/_torch/pyexecutor/config_utils.py
  • tensorrt_llm/_torch/pyexecutor/model_loader.py
  • tests/integration/defs/accuracy/references/mmmu.yaml
  • tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py
  • tests/unittest/_torch/modeling/test_modeling_qwen3_5_vl_moe.py

Comment thread tensorrt_llm/_torch/pyexecutor/config_utils.py
Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py Outdated
Comment thread tensorrt_llm/_torch/configs/__init__.py Outdated
Comment thread tensorrt_llm/_torch/pyexecutor/config_utils.py Outdated
Comment thread tensorrt_llm/_torch/pyexecutor/config_utils.py Outdated
Comment thread tensorrt_llm/_torch/pyexecutor/config_utils.py
Comment thread tests/integration/defs/accuracy/references/mmmu.yaml
Comment thread tensorrt_llm/_torch/pyexecutor/config_utils.py Outdated
Comment thread tensorrt_llm/_torch/pyexecutor/config_utils.py Outdated
Comment thread tensorrt_llm/_torch/pyexecutor/config_utils.py Outdated
Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py
@moraxu moraxu marked this pull request as ready for review May 18, 2026 18:21
@moraxu moraxu requested review from a team as code owners May 18, 2026 18:21
@moraxu moraxu requested a review from 2ez4bz May 18, 2026 18:21
@moraxu

moraxu commented May 18, 2026

Copy link
Copy Markdown
Collaborator Author

/bot run

@moraxu moraxu requested a review from Tabrizian May 18, 2026 18:21
@moraxu

moraxu commented May 18, 2026

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #48959 [ run ] triggered by Bot. Commit: 9438b0d Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #48959 [ run ] completed with state SUCCESS. Commit: 9438b0d
/LLM/main/L0_MergeRequest_PR pipeline #38705 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@moraxu

moraxu commented May 18, 2026

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #48976 [ run ] triggered by Bot. Commit: 9438b0d Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #48976 [ run ] completed with state SUCCESS. Commit: 9438b0d
/LLM/main/L0_MergeRequest_PR pipeline #38721 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@moraxu

moraxu commented May 18, 2026

Copy link
Copy Markdown
Collaborator Author

/bot run

@moraxu

moraxu commented May 18, 2026

Copy link
Copy Markdown
Collaborator Author

/bot kill

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #48987 [ run ] triggered by Bot. Commit: 9438b0d Link to invocation

nv-guomingz and others added 8 commits May 20, 2026 20:28
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
@moraxu moraxu requested a review from a team as a code owner May 21, 2026 03:36
@moraxu moraxu requested review from liji-nv and syuoni May 21, 2026 03:36
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
@moraxu

moraxu commented May 21, 2026

Copy link
Copy Markdown
Collaborator Author

/bot help

@github-actions

Copy link
Copy Markdown

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Supports wildcard * for pattern matching (e.g., "*PerfSanity*" matches all stages containing PerfSanity). Examples: "A10-PyTorch-1, xxx", "PerfSanity". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Supports wildcard * for pattern matching. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx", --extra-stage "Post-Merge".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@moraxu

moraxu commented May 21, 2026

Copy link
Copy Markdown
Collaborator Author

/bot reuse-pipeline

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #49571 [ reuse-pipeline ] triggered by Bot. Commit: ee6511e Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #49571 [ reuse-pipeline ] completed with state SUCCESS. Commit: ee6511e
Reusing PR_Github #49455 for commit ee6511e

Link to invocation

@Tabrizian Tabrizian left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed py_executor/* changes and LGTM.

@moraxu moraxu merged commit 96a4a09 into NVIDIA:main May 21, 2026
7 checks passed
KleinBlueC pushed a commit to KleinBlueC/TensorRT-LLM that referenced this pull request May 26, 2026
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
KleinBlueC pushed a commit to KleinBlueC/TensorRT-LLM that referenced this pull request May 26, 2026
@moraxu

moraxu commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

This PR was later reverted due to MTP issues, see the follow up PR here: #14599

bmarimuthu-nv pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request May 28, 2026
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
bmarimuthu-nv pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request May 28, 2026
@moraxu moraxu changed the title [TRTLLM-12500][feat] Add support for Qwen3.5 VL MoE [TRTLLM-12500][feat] Add support for Qwen3.5 VL MoE - REVERTED by #14599 May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants